Goto

Collaborating Authors

 nesterov acceleration



Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

Neural Information Processing Systems

We introduce the ``continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.


Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems

Neural Information Processing Systems

In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit.




On the Effectiveness of the z-Transform Method in Quadratic Optimization

Bach, Francis

arXiv.org Artificial Intelligence

Characterizing the convergence of real-valued or vector-v alued sequences is a key theoretical problem in data science, where the sequence index typically correspon ds to the number of iterations of an iterative algorithm (such as in optimization and signal processing) o r the number of observations (as in statistics and machine learning). This characterization can be done in mostly two ways, asymptotically or non-asymptotically. In an asymptotic analysis, an asymptotic e quivalent of the sequence is identified, which readily allows comparisons with other algorithms; however, without further analysis, the behavior at any finite time cannot be controlled. This is exactly what non-as ymptotic analysis aims to achieve, by providing bounds that are valid even for a finite index, but then only pro viding bounds that cannot always be compared. While the two approaches have their own merits, in this paper, we focus on asymptotic analysis and sequences that tend to their limit at a sub-exponential r ate that is a power of the sequence index. The main goal of this paper is to show how a classical tool from signal processing, control theory, and electrical engineering ( Oppenheim et al., 1996), the z -transform method ( Jury, 1964), can be used in this context with a striking efficiency at obtaining asymptotic eq uivalents for the class of algorithms that can be seen as iterations of potentially random linear operators i n a Hilbert space. This includes gradient descent for quadratic optimization problems as well as its accelera ted and stochastic variants ( Nesterov, 2018), 1 Landweber iterations in inverse problems ( Benning and Burger, 2018), or gossip algorithms in distributed computing ( Boyd et al., 2006).


Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

Neural Information Processing Systems

We introduce the continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise.


Nesterov acceleration despite very noisy gradients

Neural Information Processing Systems

We present a generalization of Nesterov's accelerated gradient descent algorithm. Our algorithm (AGNES) provably achieves acceleration for smooth convex and strongly convex minimization tasks with noisy gradient estimates if the noise intensity is proportional to the magnitude of the gradient at every point. Nesterov's method converges at an accelerated rate if the constant of proportionality is below 1, while AGNES accommodates any signal-to-noise ratio. The noise model is motivated by applications in overparametrized machine learning. AGNES requires only two parameters in convex and three in strongly convex minimization tasks, improving on existing methods.


Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms

Neural Information Processing Systems

We introduce the continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise.


Nesterov Acceleration for Ensemble Kalman Inversion and Variants

Vernon, Sydney, Bach, Eviatar, Dunbar, Oliver R. A.

arXiv.org Artificial Intelligence

Ensemble Kalman inversion (EKI) is a derivative-free, particle-based optimization method for solving inverse problems. It can be shown that EKI approximates a gradient flow, which allows the application of methods for accelerating gradient descent. Here, we show that Nesterov acceleration is effective in speeding up the reduction of the EKI cost function on a variety of inverse problems. We also implement Nesterov acceleration for two EKI variants, unscented Kalman inversion and ensemble transform Kalman inversion. Our specific implementation takes the form of a particle-level nudge that is demonstrably simple to couple in a black-box fashion with any existing EKI variant algorithms, comes with no additional computational expense, and with no additional tuning hyperparameters. This work shows a pathway for future research to translate advances in gradient-based optimization into advances in gradient-free Kalman optimization.